Linear and non-linear fusion of ALISP-based and GMM systems for text-independent speaker verification
نویسندگان
چکیده
Current state-of-the-art speaker verification algorithms use Gaussian Mixture Models (GMM) to estimate the probability density function of the acoustic feature vectors. They are denoted here as global systems. In order to give better performance, they have to be combined with other classifiers, using different fusion methods. The performance of the final classifier depend on the choice of the single classifiers and also on the fusion technique used to combine them. In our previous studies we have used the data-driven Automatic Language Independent Speech Processing (ALISP) segmentation method to segment the speech data, as a first step of the speaker verification task. Dynamic Time Warping (DTW) distortion measure was used as a distortion measure between two speech segments and Logistic Regression Function to determine the optimal weights of the speech segments (including “silences”). This system is denoted as ALISP-DTW system. In this paper the focus is put on the fusion techniques used to combine ALISP-DTW and GMM systems. We show that when using a non-linear fusion method (Multi-Layer Perceptron), we improve slightly the final fusion result as compared to the linear fusion strategies.
منابع مشابه
A segmental approach to text-independent speaker verification
Current text-independent speaker veri cation systems are usually based on modeling globally the probability density function (PDF) of the speaker feature vectors. In this paper, segmental approaches to text-independent speaker veri cation are discussed. Unlike the schemes based on Large Vocabulary Continuous Speech Recognition (LVCSR) with previously trained phone models, our systems are based ...
متن کاملComparison between factor analysis and GMM support vector machines for speaker verification
We present a comparison between speaker verification systems based on factor analysis modeling and support vector machines using GMM supervectors as features. All systems used the same acoustic features and they were trained and tested on the same data sets. We test two types of kernel (one linear, the other non-linear) for the GMM support vector machines. The results show that factor analysis ...
متن کاملExploiting High-Level Information Provided by ALISP in Speaker Recognition
The best performing systems in the area of automatic speaker recognition have focused on using short-term, low-level acoustic information, such as sepstral features. Recently, various works have demonstrated that high-level features convey more speaker information and can be added to the low-level features in order to increase the robustness of the system. This paper describes a text-independen...
متن کاملLinear and non linear kernel GMM supervector machines for speaker verification
This paper presents a comparison between Support Vector Machines (SVM) speaker verification systems based on linear and non linear kernels defined in GMM supervector space. We describe how these kernel functions are related and we show how the nuisance attribute projection (NAP) technique can be used with both of these kernels to deal with the session variability problem. We demonstrate the imp...
متن کاملGeneralized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification
This paper presents a generalized i-vector representation framework with phonetic tokenization and tandem features for text independent as well as text dependent speaker verification. In the conventional i-vector framework, the tokens for calculating the zeroorder and first-order Baum-Welch statistics are Gaussian Mixture Model (GMM) components trained from acoustic level MFCC features. Yet bes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004